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L A method of processing text data, comprising the steps of: 
inputting text data; 
5 parsing the text data into word candidates; 

removing predetermined words from the word candidates; 
specifying an area of a predetermined text database; and 

determining a specific area occurrence value of each of the word candidates in the 
specified area in the predetermined text database. 

10 

2. The method of processing text data according to claim 1 wherein the specified area is a 
header area. 



3. The method of processing text data according to claim 2 wherein the specific area 
1 5 occurrence value is determined according to a following equation: 

the specific area occurrence value = 

a number of documents including the word candidate in the header area / 
a number of documents including the word candidate in an entire portion of the 
2 0 predetermined text database. 

4. The method of processing text data according to claim 1 wherein the specified area is a 
summary area. 

2 5 5. The method of processing text data according to claim 4 wherein the specific area 
occurrence value is determined according to a following equation: 

the specific area occurrence value = 

a number of documents including the word candidate in the summary area / 
30 a number of documents including the word candidate in an entire portion of the 

predetermined text database. 
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6. The method of processing text data according to claim 1 wherein the specified area is a 
combination of a header area and a summary area. 

7. The method of processing text data according to claim 6 wherein the specific area 
occurrence value is determined according to a following equation: 

the specific area occurrence value = 

a number of documents including the word candidate in either one of the 
summary, area and the header area / 

a number of documents including the word candidate in an entire portion of the 
predetermined text database. 

8. The method of processing text data according to claim 6 wherein the specific area 
occurrence value is determined according to a following equation: 

the specific area occurrence value = 

(a number of documents including the word candidate in the header area / 

a number of documents including the word candidate in an entire portion of the 

predetermined text database) + 

(a number of documents including the word candidate in the summary area / 
a number of documents including the word candidate in an entire portion of the 
predetercnined text database) 

9. The method of processing text data according to claim 1 further comprising an 
additional step of determining a search word significance value based upon a following 
equation: 

the search word significance value = 

a corresponding predetermined word weight X 

the specific area occurrence value, 
wherein the corresponding predetermined word weight is log (a total number of 
documents/ the number of documents in which the word candidate occurs). 
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10. The method of processing text data according to claim 1 further comprising an 
additional step of: 

5 determining a search word significance value based upon a following equation: 

the search word significance value = 

a corresponding predetermined word weight X 
the specific area occurrence value X 
10 a number of occurrences of the word candidate within the text 

data. 

1 1. The method of processing text data according to claim 1 further comprising additional 
steps of: 

1 5 selecting search words from the word candidates based upon the specific area 

occurrence value; and 

extracting sentences from the predetermined text database based upon the selected 
search words. 

2 0 12. The method of processing text data according to claim 1 further comprising an 

additional step of selecting keywords from the word candidates based upon the specific 
area occurrence value. 

13. The method of processing text data according to claim 1 further comprising additional 
2 5 steps of: 

selecting keywords from the word candidates based upon the specific area 
occurrence value; and 

generating a summary from the predetermined text database based upon the 
selected keywords. 

30 

14. The method of processing text data according to claim 1 further comprising additional 
steps of: 
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selecting classification keywords from the word candidates based upon the 
specific area occurrence value; and 

classifying the predetermined text database based upon the selected classification 
keywords. 

5 

15. The method of processing text data according to claim 1 further comprising additional 
steps of: 

determining a first text database occurrence value of the word candidates in a first 
text database; 

10 determining a second text database occurrence value of the word candidates in a 

second text database; 

determining a database occurrence value based upon the first text database 
occurrence value and the second text database occurrence value in a predetermined 
manner; 

15 selecting search words from the word candidates based upon in part the database 

occurrence value; and 

extracting sentences from a predetermined text database based upon the selected 
search words. 

2 0 16. The method of processing text data according to claim 15 wherein the database 

occurrence value is determined by a following equation: 

the database occurrence value = 

(the second text database occurrence value / 
25 a total number of documents in the second text database) - 

(the first text database occurrence value / 
a total number of documents in the first text database). 

17. The method of processing text data according to claim 15 wherein the database 

3 0 occurrence value is determined by a following equation: 

the database occurrence value = 
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(the second text database occurrence value / 

a total number of documents in the second text database) / 

(the first text database occurrence value / 

a total number of documents in the first text database). 

5 

1 8. The method of processing text data according to claim 15 further comprising an 
additional step of determining a search word significance value based upon a following 
equation: 

1 0 the search word significance value = 

the corresponding predetermined word weight X 

the database occurrence value, 
wherein the corresponding predetermined word weight is log (a total number of 
documents/ the number of documents in which the word candidate occurs). 

15 

19. A method of processing text data, comprising the steps of: 

inputting text data; 

parsing the text data into word candidates; 
removing predetermined words from the word candidates; 
2 0 determining a first text database occurrence value of the word candidates in a first 

text database; 

determining a second text database occurrence value of the word candidates in a 
second text database; 

determining a database occurrence value based upon the first text database 

2 5 occurrence value and the second text database occurrence value in a predetermined 

manner; 

selecting search words from the word candidates based upon in part the database 
occurrence value; and 

extracting sentences from a predetermined text database based upon the selected 

3 0 search words. 
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20. The method of processing text data according to claim 19 wherein the database 
occurrence value is determined by a following equation: 

the database occurrence value = 
5 (the second text database occurrence value / 

a total number of documents in the second text database) - 

(the first text database occurrence value / 

a total number of documents in the first text database). 

10 21. The method of processing text data according to claim 19 wherein the database 
occurrence value is determined by a following equation: 

the database occurrence value = 

(the second text database occurrence value / 
15 a total number of documents in the second text database) / 

(the first text database occurrence value / 
a total number of documents in the first text database). 

22. The method of processing text data according to claim 19 further comprising an 

2 0 additional step of determining a search word significance value based upon a following 
equation: 

the search word significance value - 

the corresponding predetermined word weight X 
the database occurrence value, 

2 5 wherein the corresponding predetermined word weight is log (a total number of 

documents/ the number of documents in which the word candidate occurs). 

23. A computer program for processing text data, performing the tasks of: 

inputting text data; 

3 0 parsing the text data into word candidates; 

removing predetermined words from the word candidates; 
specifying an area of a predetermined text database; and 
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determining a specific area occurrence value of each of the word candidates in the 
specified area in the predetermined text database in a predetermined manner. 



24. The computer program for processing text data according to claim 23 wherein the 
5 specified area is a header area. 

25. The computer program for processing text data according to claim 24 wherein the 
specific area occurrence value is determined according to a following equation: 

1 0 the specific area occurrence value = 

a number of documents including the word candidate in the header area / 

a number of documents including the word candidate in an entire portion of the 

predetermined text database. 

15 26. The computer program for processing text data according to claim 23 wherein the 
specified area is a summary area. 

27. The computer program for processing text data according to claim 26 wherein the 
specific area occurrence value is determined according to a following equation: 

20 

the specific area occurrence value = 

a number of documents including the word candidate in the summary area / 
a number of documents including the word candidate in an entire portion of the 
predetermined text database. 

25 

28. The computer program for processing text data according to claim 23 wherein the 
specified area is a combination of a header area and a summary area. 

29 The computer program for processing text data according to claim 28 wherein the 
3 0 specific area occurrence value is determined according to a following equation: 

the specific area occurrence value = 
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a number of documents including the word candidate in either one of the 
summary area and the header area / 

a number of documents including the word candidate in an entire portion of the 
predetermined text database. 

30. The computer program for processing text data according to claim 28 wherein the 
specific area occurrence value is determined according to a following equation: 

the specific area occurrence value = 

(a number of documents including the word candidate in the header area / 

a number of documents including the word candidate in an entire portion of the 

predetermined text database) + 

(a number of documents including the word candidate in the summary area / 
a number of documents including the word candidate in an entire portion of the 
predetermined text database). 

3 1 . The computer program for processing text data according to claim 23 further 
comprising an additional task of determining a search word significance value based upon 
a following equation: 

the search word significance value = 

a corresponding predetermined word weight X 

the specific area occurrence value, 
wherein the corresponding predetermined word weight is log (a total number of 
documents/ the number of documents in which the word candidate occurs). 

32. The computer program for processing text data according to claim 23 further 
performing an additional task of determining a search word significance value based upon 
a following equation: 

the search word significance value = 

a corresponding predetermined word weight X 
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the specific area occurrence value X 

a number of occurrences of the word candidate within the text data. 
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33. The computer program for processing text data according to claim 23 further 
performing additional tasks of: 

selecting search words from the word candidates based upon the specific area 
occurrence value; and 

extracting sentences from the predetermined text database based upon the selected 
search words. 

34. The computer program for processing text data according to claim 23 further 
performing an additional task of selecting keywords from the word candidates based upon 
the specific area occurrence value. 

35. The computer program for processing text data according to claim 23 further 
performing additional tasks of: 

selecting keywords from the word candidates based upon the specific area 
occurrence value; and 

generating a summary from the predetermined text database based upon the 
selected keywords. 

36. The computer program for processing text data according to claim 23 further 
performing additional tasks of: 

selecting classification keywords from the word candidates based upon the 
specific area occurrence value; and 

classifying the predetermined text database based upon the selected classification 
keywords. 

37. The computer program for processing text data according to claim 23 further 
performing additional task of: 

determining a first text database occurrence value of the word candidates in a first 
text database; 
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determining a second text database occurrence value of the word candidates in a 
second text database; 

determining a database occurrence value based upon the first text database 
occurrence value and the second text database occurrence value in a predetermined 
5 manner; 

selecting search words from the word candidates based upon in part the database 
occurrence value; and 

extracting sentences from the predetermined text database based upon the selected 
search words. 

10 

38. The computer program for processing text data according to claim 37 wherein the 
database occurrence value is determined by a following equation: 

the database occurrence value = 

(the second text database occurrence value / a total number of documents in the second text 
15 database) - 

(the first text database occurrence value / a total number of documents in the first text 
database). 

39. The computer program for processing text data according to claim 37 wherein the 
2 0 database occurrence value is determined by a following equation: 

the database occurrence value = 

(the second text database occurrence value / 

a total number of documents in the second text database) / 

2 5 (the first text database occurrence value / 

a total number of documents in the first text database). 

40. The computer program for processing text data according to claim 37 further 
performing an additional task of determining a search word significance value based upon 

3 0 a following equation: 

the search word significance value = 
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the corresponding predetermined word weight X 

the database occurrence value, 
wherein the corresponding predetermined word weight is log (a total number of 
documents/ the number of documents in which the word candidate occurs). 

5 

4 1 . A computer program for processing text data, performing the tasks of: 

inputting text data; 

parsing the text data into word candidates; 
removing predetermined words from the word candidates; 
1 0 determining a first text database occurrence value of the word candidates in a first 

text database; 

determining a second text database occurrence value of the word candidates in a 
second text database; 

determining a database occurrence value based upon the first text database 
15 occurrence value and the second text database occurrence value in a predetermined 
manner; 

selecting search words from the word candidates based upon in part the database 
occurrence value; and 

extracting sentences from the predetermined text database based upon the selected 
2 0 search words. 

42. The computer program for processing text data according to claim 41 wherein the 
database occurrence value is determined by a following equation: 

2 5 the database occurrence value = 

(the second text database occurrence value / 

a total number of documents in the second text database) - 

(the first text database occurrence value / 

a total number of documents in the first text database). 

30 

43. The computer program for processing text data according to claim 41 wherein the 
database occurrence value is determined by a following equation: 
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the database occurrence value = 

(the second text database occurrence value / 
a total number of documents in the second text database) / 
5 (the first text database occurrence value / 

a total number of documents in the first text database). 

44. The computer program for processing text data according to claim 41 further 
comprising an additional step of determining a search word significance value based upon 

10 a following equation: 

the search word significance value = 

the corresponding predetermined word weight X 

the database occurrence value, 
1 5 wherein the corresponding predetermined word weight is log (a total number of 

documents/ a number of documents including the word candidate in an entire portion of the 
predetermined text database). 

45. A apparatus for processing text data, comprising: 
2 0 an input unit for inputting text data; 

a search word selection unit connected to said input unit for parsing the text data 
into word candidates, said search word selection unit removing predetermined words from 
the word candidates; 

an area specification unit for specifying an area of a predetermined text database; 

2 5 and 

a specific area occurrence determination unit connected to said search word 
selection unit and said area specification unit for determining a specific area occurrence 
value of each of the word candidates in the specified area in the predetermined text 
database. 

30 

46. The apparatus for processing text data according to claim 45 wherein the specified area 
is a header area. 
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47. The apparatus for processing text data according to claim 46 wherein said specific area 
occurrence determination unit determines the specific area occurrence value according to a 
following equation: 

5 

the specific area occurrence value = 

a number of documents including the word candidate in the header area/ 

a number of documents including of the word candidate in an entire portion of the 

predetermined text database. 

10 

48. The apparatus for processing text data according to claim 45 wherein the specified area 
is a summary area. 

49. The apparatus for processing text data according to claim 48 wherein said specific area 
15 occurrence determination unit determines the specific area occurrence value according to a 

following equation: 

the specific area occurrence value = 

a number of documents including the word candidate in the summary area / 
a number of documents including the word candidate in an entire portion of the 
2 0 predetermined text database. 

50. The apparatus for processing text data according to claim 45 wherein the specified area 
is a combination of a header area and a summary area. 

2 5 51. The apparatus for processing text data according to claim 50 wherein said specific area 

occurrence determination unit determines the specific area occurrence value according to a 

following equation: 

the specific area occurrence value = 

a number of documents including the word candidate in either one of the 

3 0 summary area and the header area / 

a number of documents including the word candidate in an entire portion of the 
predetermined text database. 
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52. The apparatus for processing text data according to claim 50 wherein said specific area 
occurrence determination unit determines the specific area occurrence value according to a 
following equation; 

5 the specific area occurrence value = 

(a number of documents including the word candidate in the header area / 

a number of documents including the word candidate in an entire portion of the 

predetermined text database) + 

(a number of documents including the word candidate in the summary area / 
10 a number of documents including the word candidate in an entire portion of the 

predetermined text database) 

53. The apparatus for processing text data according to claim 45 wherein said search word 
selection unit further determines a search word significance value based upon a following 

15 equation: 

the search word significance value = 

a corresponding predetermined word weight X 
the specific area occurrence value, 
wherein the corresponding predetermined word weight is log (a total number of 
2 0 documents/ the number of documents in which the word candidate occurs. 

54. The apparatus for processing text data according to claim 45 wherein said search word 
selection unit further determines a search word significance value based upon a following 
equation: 

2 5 the search word significance value = 

a corresponding predetermined word weight X 
the specific area occurrence value X 

a number of occurrences of the word candidate within the text data. 

3 0 55. The apparatus for processing text data according to claim 45 further comprising a text 

selection unit connected to said specific area occurrence determination unit for selecting 
search words from the word candidates based upon the specific area occurrence value, said 
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text selection unit extracting sentences from the predetermined text database based upon 
the selected search words. 



56. The apparatus for processing text data according to claim 45 further comprising a 

5 keyword extraction unit connected to said specific area occurrence determination unit for 
selecting keywords from the word candidates based upon the specific area occurrence 
value. 

57. The apparatus for processing text data according to claim 45 further comprising: 
10 a keyword extraction unit connected to said specific area occurrence 

determination unit for selecting keywords from the word candidates based upon the 
specific area occurrence value; and 

a summary generation unit connected to said keyword extraction unit for 
generating a summary from the predetermined text database based upon the selected 
15 keywords. 

58. The apparatus for processing text data according to claim 45 further comprising: 

a classification keyword selection unit connected to said specific area occurrence 
determination unit for selecting classification keywords from the word candidates based 
2 0 upon the specific area occurrence value; and 

a classification unit connected to said classification keyword selection unit for 
classifying the predetermined text database based upon the selected classification 
keywords. 

2 5 59. The apparatus for processing text data according to claim 45 further comprising: 

a database occurrence determination unit connected to said search word selection 
unit for determining a first text database occurrence value of the word candidates in a first 
text database and a second text database occurrence value of the word candidates in a 
second text database, said database occurrence determination unit further determining a 

3 0 database occurrence value based upon the first text database occurrence value and the 

second text database occurrence value in a predetermined manner, wherein said search 
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word selection unit selects search words from the word candidates based upon in part the 
database occurrence value; and 

a text selection unit connected to said search word selection unit for extracting 
sentences from the predetermined text database based upon the selected search words. 

60. The apparatus for processing text data according to claim 59 wherein said database 
occurrence determination unit determines the database occurrence value based upon a 
following equation: 

the database occurrence value = 

(the second text database occurrence value / 

a total number of documents in the second text database) - 

(the first text database occurrence value / 

a total number of documents in the first text database). 

61. The apparatus for processing text data according to claim 59 wherein said database 
occurrence determination unit determines the database occurrence value based upon a 
following equation: 

the database occurrence value = 

(the second text database occurrence value / 

a total number of documents in the second text database) / 

(the first text database occurrence value / 

a total number of documents in the first text database). 

62. The apparatus for processing text data according to claim 45 wherein said search word 
selection unit further determines a search word significance value based upon a following 
equation: 

the search word significance value = 

the corresponding predetermined word weight X 
the database occurrence value, 
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wherein the corresponding predetermined word weight is log (a total number of 
documents/ the number of documents in which the word candidate occurs). 
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63. A apparatus for processing text data, comprising: 
an input unit for inputting text data; 

a search word selection unit connected to said input unit for parsing the text data 
into word candidates, said search word selection unit removing predetermined words from 
the word candidates; 

a database occurrence determination, unit connected to said search word selection 
unit for determining a first text database occurrence value of the word candidates in a first 
text database and a second text database occurrence value of the word candidates in a 
second text database, said database occurrence determination unit further determining a 
database occurrence value based upon the first text database occurrence value and the 
second text database occurrence value in a predetermined manner, wherein said search 
word selection unit selects search words from the word candidates based upon in part the 
database occurrence value; and 

a text selection unit connected to said search word selection unit for extracting 
sentences from the predetermined text database based upon the selected search words. 

64. The apparatus for processing text data according to claim 63 wherein said database 
occurrence determination unit determines the database occurrence value based upon a 
following equation: 

the database occurrence value = 

(the second text database occurrence value / 
a total number of documents in the second text database) - 
(the first text database occurrence value / 
a total number of documents in the first text database). 

65. The apparatus for processing text data according to claim 63 wherein said database 
occurrence determination unit determines the database occurrence value based upon a 
following equation: 
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the database occurrence value = 

• (the second text database occurrence value / 
a total number of documents in the second text database) / 
(the first text database occurrence value / 
a total number of documents in the first text database). 

66. The apparatus for processing text data according to claim 63 wherein said search word 
selection unit further determines a search word significance value based upon a following 
equation: 

the search word significance value = 

the corresponding predetermined word weight X 

the database occurrence value, 
wherein the corresponding predetermined word weight is log (a total number of 
documents/ the number of documents in which the word candidate occurs). 
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